In [54]:
data analysis
data info:
number of numeric columns: 17
 number of categorical columns: 4
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   ID                                         21613 non-null  int64  
 1   Date House was Sold                        21613 non-null  object 
 2   Sale Price                                 21609 non-null  float64
 3   No of Bedrooms                             21613 non-null  int64  
 4   No of Bathrooms                            21609 non-null  float64
 5   Flat Area (in Sqft)                        21604 non-null  float64
 6   Lot Area (in Sqft)                         21604 non-null  float64
 7   No of Floors                               21613 non-null  float64
 8   Waterfront View                            21613 non-null  object 
 9   No of Times Visited                        2124 non-null   object 
 10  Condition of the House                     21613 non-null  object 
 11  Overall Grade                              21613 non-null  int64  
 12  Area of the House from Basement (in Sqft)  21610 non-null  float64
 13  Basement Area (in Sqft)                    21613 non-null  int64  
 14  Age of House (in Years)                    21613 non-null  int64  
 15  Renovated Year                             21613 non-null  int64  
 16  Zipcode                                    21612 non-null  float64
 17  Latitude                                   21612 non-null  float64
 18  Longitude                                  21612 non-null  float64
 19  Living Area after Renovation (in Sqft)     21612 non-null  float64
 20  Lot Area after Renovation (in Sqft)        21613 non-null  int64  
dtypes: float64(10), int64(7), object(4)
memory usage: 3.5+ MB
None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
number of missing values in each column:
ID                                               0
Date House was Sold                              0
Sale Price                                       4
No of Bedrooms                                   0
No of Bathrooms                                  4
Flat Area (in Sqft)                              9
Lot Area (in Sqft)                               9
No of Floors                                     0
Waterfront View                                  0
No of Times Visited                          19489
Condition of the House                           0
Overall Grade                                    0
Area of the House from Basement (in Sqft)        3
Basement Area (in Sqft)                          0
Age of House (in Years)                          0
Renovated Year                                   0
Zipcode                                          1
Latitude                                         1
Longitude                                        1
Living Area after Renovation (in Sqft)           1
Lot Area after Renovation (in Sqft)              0
dtype: int64
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
treating the missing values:
updated values:
ID                                           0
Date House was Sold                          0
Sale Price                                   0
No of Bedrooms                               0
No of Bathrooms                              0
Flat Area (in Sqft)                          0
Lot Area (in Sqft)                           0
No of Floors                                 0
Waterfront View                              0
Condition of the House                       0
Overall Grade                                0
Area of the House from Basement (in Sqft)    0
Basement Area (in Sqft)                      0
Age of House (in Years)                      0
Renovated Year                               0
Zipcode                                      0
Latitude                                     0
Longitude                                    0
Living Area after Renovation (in Sqft)       0
Lot Area after Renovation (in Sqft)          0
dtype: int64
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
statestical information:
                 ID    Sale Price  No of Bedrooms  No of Bathrooms  \
count  2.161300e+04  2.161300e+04    21613.000000     21613.000000   
mean   4.580302e+09  5.401984e+05        3.370842         2.114732   
std    2.876566e+09  3.673550e+05        0.930062         0.770067   
min    1.000102e+06  7.500000e+04        0.000000         0.000000   
25%    2.123049e+09  3.220000e+05        3.000000         1.750000   
50%    3.904930e+09  4.500000e+05        3.000000         2.250000   
75%    7.308900e+09  6.450000e+05        4.000000         2.500000   
max    9.900000e+09  7.700000e+06       33.000000         8.000000   

       Flat Area (in Sqft)  Lot Area (in Sqft)  No of Floors  Overall Grade  \
count         21613.000000        2.161300e+04  21613.000000   21613.000000   
mean           2079.931772        1.510776e+04      1.494309       7.623467   
std             918.296332        4.141964e+04      0.539989       1.105439   
min             290.000000        5.200000e+02      1.000000       1.000000   
25%            1430.000000        5.040000e+03      1.000000       7.000000   
50%            1910.000000        7.620000e+03      1.500000       7.000000   
75%            2550.000000        1.070100e+04      2.000000       8.000000   
max           13540.000000        1.651359e+06      3.500000      10.000000   

       Area of the House from Basement (in Sqft)  Basement Area (in Sqft)  \
count                               21613.000000             21613.000000   
mean                                 1788.344193               291.509045   
std                                   827.925135               442.575043   
min                                   290.000000                 0.000000   
25%                                  1190.000000                 0.000000   
50%                                  1560.000000                 0.000000   
75%                                  2210.000000               560.000000   
max                                  9410.000000              4820.000000   

       Age of House (in Years)  Renovated Year       Zipcode      Latitude  \
count             21613.000000    21613.000000  21613.000000  21613.000000   
mean                 46.994864       84.402258  98077.937766     47.560048   
std                  29.373411      401.679240     53.504187      0.138562   
min                   3.000000        0.000000  98001.000000     47.155900   
25%                  21.000000        0.000000  98033.000000     47.471000   
50%                  43.000000        0.000000  98065.000000     47.571800   
75%                  67.000000        0.000000  98118.000000     47.678000   
max                 118.000000     2015.000000  98199.000000     47.777600   

          Longitude  Living Area after Renovation (in Sqft)  \
count  21613.000000                            21613.000000   
mean    -122.213892                             1986.538914   
std        0.140827                              685.388397   
min     -122.519000                              399.000000   
25%     -122.328000                             1490.000000   
50%     -122.230000                             1840.000000   
75%     -122.125000                             2360.000000   
max     -121.315000                             6210.000000   

       Lot Area after Renovation (in Sqft)  
count                         21613.000000  
mean                          12768.455652  
std                           27304.179631  
min                             651.000000  
25%                            5100.000000  
50%                            7620.000000  
75%                           10083.000000  
max                          871200.000000  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
data visualization:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pairwise correlation:
In [ ]: